The tests

PCMark2002 contains a number of tests divided into different categories. There are CPU tests, Memory tests, HDD tests, the Crunch test, Windows XP 2D tests, Video Performance and Quality tests, and a Battery test. The Windows XP 2D tests and the video tests are only available in the Pro version of PCMark2002.

CPU tests

CPU Tests - Overall
The CPU tests stress above all the CPU, even though they are also somewhat dependent on the memory speed. The CPU tests do a number of small processor intensive tasks typical for applications in home and office PC usage. There are tests doing both integer and floating point operations.

These tests accurately reflect the tasks performed with PCs in home and office use, since the test algorithms and used source code are commonly used. On the other hand, home and office PC usage is a very wide concept. Different applications stress the PC in very different ways. For example, large applications might reserve a major part of the system memory just for the application code. The application data might not fit in the system memory and hard disk access might occur during application execution. This will naturally affect the performance considerably.

An important issue in CPU benchmarking is the choice of compiler. PCMark2002 includes CPU test compilations of both the MS Visual C++ default compiler, and the Intel C++ Compiler. There is one Intel compilation optimized for CPUs with SSE support and one optimized for SSE2 support. Some tests are compiled faster with the default compiler, while others get better results with the Intel Compiler. PCMark2002 therefore uses the optimal compilations for Intel CPUs supporting SSE, Intel CPUs supporting SSE2, and AMD CPUs supporting SSE. All other CPUs will run all tests compiled with the default compiler.

CPU Test - JPEG decompression
JPEG image decompression is a very typical CPU task when browsing the web, reading documents with images, or doing anything involving images. The JPEG test decompresses during 10 seconds 3 different images sequentially of the file sizes 149kB, 771kB, and 889kB. The result unit of the test is Mpixels/s, in other words, how many image pixels, on average, where decompressed per second.

Technical details:

Standard JPEG library (version 6b) from Independent JPEG Group (www.ijg.org) is used in this test. The source code for the decompressor can be found on that website for further analysis of the tasks involved in this test. An image file is completely loaded into the memory before decompression. The JPEG decoding pipeline uses fixed-point IDCT and RGB-24 output pixel format, so this test uses only integer operations. The image is decoded one scan line at the time to one scan line length buffer to provide cache coherent memory usage.

CPU Test - Zlib compression & decompression
File compression and decompression is also a very common task in everyday PC usage both in office and home usage. Archived data and above all data to be transferred is usually compressed and decompressed before it can be used after the transfer. Three files, one 887kB JPG image, one 1468kB text file, and one 1280kB executable, are first compressed as many times as possible during 10 seconds. Next the archive is decompressed in a loop for 10 seconds. The result unit for both compression and decompression is MBytes/s, or how many megabytes of data are compressed/decompressed per second.

Technical details:

This test involves only integer computing. Three uncompressed files, one text file, and two binary files are used as data source for the compression and decompression tests. Each individual file is completely loaded into a 1MB memory buffer before compression. Another buffer, with the size of 500KB is selected as the compressed data buffer, and the time it takes to compress/decompress the data is recorded as an indication of CPU speed. The LZ77 compression method from the open source ZLIB (www.gzip.org/zlib/) is used, so anybody can do an in-depth analysis what kind of CPU intensive tasks are involved in this test.

CPU test - Text search
A large text file is used in this test and a number of search operations are done. This test was chosen because text search is a common task when both browsing the web, using e-mail, and dealing with documents in general. This test contains only integer operations.

Technical details:

The test uses the Boyer-Moore algorithm, which is considered the most efficient string-matching algorithm in usual applications. A text file of 1.5MB is loaded into system memory before the search starts. A number of words frequently appearing in the text are then searched for. The text search is looped for 10 seconds and the number of executed loops is the test result.

CPU test - Audio Conversion
A MP3-compressed audio file is decompressed and compressed using the public audio compression format Ogg Vorbis (http://www.gnu.org/directory/oggvorbis.html). The Microsoft MP3 decoder (shipped as DirectShow filter with DirectX) is used for decompression. As MP3 encoders have high license fees, the Ogg Vorbis encoder is used. Since the Ogg Vorbis file format is very close to MP3, this test offers the CPU workloads that correspond to both MP3 playback and compression.

Technical details:

A 30 second 128kbit/s MP3 stream (500kb compressed size) is simultaneously decompressed to 44kHz/16bit PCM format and further compressed to Ogg Vorbis format. DirectShow has been utilized to construct proper filter graph for performing the task and the time spent for the conversion is measured. Filters participating in the graph are: File Source (Async.), MPEG-I Stream Splitter, MPEG Layer-3 Decoder, Vorbis Stream Encoder, AVI Mux and Null Renderer, which can be used to construct the same graph as in the test with a tool called GraphEdit (shipped with DirectX 8 SDK). All filters except the Vorbis Stream Encoder are shipped with DirectX 8. Vorbis Stream Encoder is shipped with PCMark2002 and registered during the installation. It can also be downloaded from http://sourceforge.net/projects/mediaxw/

CPU test - 3D Vector Calculation
Simulating hair on a human head is something we already see in the most advanced real-time 3D demos. This is something that is expected to also show up in games in the near future. The task can be approximated in a number of ways, but if done realistically, each hair is an object of a large number of polygons. The hair object is slightly bent from a large amount of small joints. The required floating point 3D transformation calculations get even more complex, because the position of each joint is dependent of all joints between this one and the root of the hair. The test is run for 10 seconds and the test result is the amount of frames completed in that time.

Technical details:

This test simulates hair physics and lighting. The hair is modeled using polygon lines with 7300 hairs 8 nodes each, consuming 7300 * (24 + 8*24) = 1576800 ~= 1.5 MB of memory. The physics code processes all hair vertices in a single pass. Vertices are affected by three forces: gravity, hair straightening, and curling. However, the simulation is not physically correct. For example, hair mass is not simulated properly and gravity is approximated with velocity (instead of acceleration). Lighting is anisotropic. Vertex normals are updated so that they point towards the light in the plane of the hair. The test is coded using MAX-FX's vector template library. The code consists mainly of vector addition, dot and cross products, and vector normalization using 32 bit floating-point values. Normalization is optimized in assembly language and uses table lookups.

The overall CPU score is calculated from the results of tests described above according to the formula:

CPU Score =
{ JPEGDecompression*60,6 + (FileCompression*153,8 + FileDecompression*12,4)/2 + TextSearch*4,9 + AudioConversion*11,1 + 3DVectorCalculation* 16,7 }

This formula has been obtained by separately fixing the individual results on multiple manufacturers' high-end systems to a reference point. The final weighting (multipliers) have then been averaged from the individual weightings.

A high end PC during the PCMark2002 launch time should get around 5000 points as the CPU score.

Memory tests

Memory Tests - Overall
The Memory tests measure the performance of the memory subsystem. Read, write, read-modify-write, and random access operations are tested on the system memory, L2 and L1 cache. These tests are implemented in the same manner as memory accesses in normal applications, and are not optimized to achieve maximum throughput. However, since no other tasks are run while performing the memory transfers, quite high throughput numbers can be expected.

Memory tests - Raw Access and Random Access
The purpose of the memory tests is to only stress the memory sub-system of the computer. It provides theoretical figures, which might be reached in practical implementations with optimal memory usage. In practice, the CPU always does some work with the data it fetches from the memory, but because the balance between CPU and memory usage can vary depending on the algorithm and implementation details, it is also necessary to test what kind of limitations the memory sub-system has in the total performance. We stress the memory sub-system with different sizes of straight memory read, write, and read-modify-write operations and browse through a STL list, with elements pseudo-randomly in memory.

Technical details:

Raw read, write, and read-modify-write operations are performed starting from a 3072 kilobytes array decreasing insize to 1536 kB, 384 KB, 48 KB and finally 6 KB. Each size of block is tested two second and the amount of accessed data is given as result. In the STL container test a list of 116 byte elements is constructed and sorted by an integer pseudo-random key. The list is then iterated through as many times as possible for 2 seconds and the total size of the accessed elements is given as result. There are 6 runs of this test, with 24576 items in the largest run corresponding to a total data amount of 1536 kB, decreasing in size to 12288 items (768 kB), 6144 items (384 kB), 1536 items (96 kB), 768 items (48 kB) and 96 items in the smallest run corresponding to 6 kB of total data.

Video memory sub-system test
This test is built to stress the video memory sub-system in the way regular Windows desktop applications utilize it. The test pushes both the graphic card's internal memory bandwidth as well as AGP bus transfer speed. The common usage of desktop applications involves scrolling of documents and moving/resizing windows, which above all, works the video memory subsystem in the same manner as this test does. The results of this test affect the total Memory score.

Technical details:

The test starts by creating a back buffered primary surface in 1024x768 32 bit resolution in DX exclusive mode and one off-screen work surface (twice as high) in the same pixel format. During the test, the work surface is updated each frame by transforming data through the AGP bus, where the amount of data transformed depends on the scrolling speed (1, 4, 16 and 32 scan lines / frame). The work surface is blitted every frame to the primary surface to stress the internal memory bandwidth. Each speed is tested 3 seconds and the frame updates made during that time is given as result.

PCMark2002 Overall Memory Score is calculated as follows:

Memory Score =
{ Read(3072*32+1536*32+384*16+48*1+6*1) + Write(3072*32+1536*32+384*16+48*1+6*1) + Modify(3072*64+1536*64+384*32+48*2+6*2) + Container(1536*64+768*64+384*32+96*4+48*2+6*2) + VideoMem(1*4+4*8+16*16+32*32) } / 160

This formula has been obtained by separately fixing the individual results on multiple manufacturers' high-end systems to a reference point. The final weighting (multipliers) have then been averaged from the individual weightings.

A high-end PC, at the time of the PCMark2002 launch, should get around 5000 points as total Memory score.

Hard disk drive tests

HDD Tests - Overall

The HDD (Hard Disk Drive) tests measure the performance of the HDD subsystem. Read and write operations are performed both cached and uncached. These tests are implemented in the same manner as accesses in normal applications, and are not optimized to achieve maximum throughput. However, since no other tasks are run while performing the data transfers, quite high throughput numbers can be expected.

The hard drive tests simulate the usage of a hard disk in real applications, such as writing files, reading files, and copying files. At the beginning of program execution, all the logical disks will be examined and those drives with enough free spaces will be listed in the Options dialog of the GUI. By default, the first logical drive (counted from C: onwards) with the necessary free space is tested. The selection 'All' HDDs will test all logical drives and give the as result that of the fastest HDD.

Technical details:

A temporary "__PCMARK" directory is created in the root directory of selected logical disk. Recursively 8x8x8 subdirectories are then created in this temporary directory. The time it takes to create this directory structure is not measured, because the directory creation time is very short and varies a lot from one run to another.

File write test
18 files are created in this test, and the size of the files vary from 1KB to 128MB, with the total amount of data written being 256MB. The average write rate (MB/s) is given as result. To simulate the random writing behaviour, these 18 files are placed randomly into the directories just created. A 1MB memory buffer is used to create the raw data for writing. For files larger than 1MB the same buffer is used several times. File writing is tested in two modes, cached and uncached. In uncached mode, the file buffer is always flushed when beginning to write a file. In cached mode, each file is written once and the file system should then copy the data to the file cache before it copies the data to hard disk. Cached writes might therefore be slower than uncached write operations. In normal applications cached data is usually reused, which gives a considerable overall performance increase.

File Read test
The read test is based on the created directory structure and written files. Again a 1MB buffer is used for storing temporary data, and the files are read in a random order. The average reading rate (MB/s) is given as result. The Read test is also done in cached and non-cached mode. In cached read mode the file system has to copy the data from the hard disk to the file cache before the data goes to the memory buffer, which usually makes the cached read slower than uncached read.

File copy test
This test copies all test files into a predefined directory, with different names. The purpose of this analysis is to combine the read and write tests. After the copy operation, the temporary files are deleted. The average data transfer rate (MB/s) is given as result.

File System impact on the score
Windows reserves a block of memory as the cache for file operations. This cache can be up to 80% of the total system memory. Before each test, we just flush the file buffer to minimize the use of cached data in the tests.

PCMark2002 Overall HDD Score is calculated as follows:

HDD Score =
{ Write(Cached*8+UnCached*4) + Read(Cached*8+UnCached*4) + Copy*16 }

The hard drive test scales differently than the CPU and memory tests. Therefore, CPU and memory total score values should not be compared to the overall HDD Score value.

Crunch test

The Crunch test runs 3 tests simultaneously:

These tests are all described above, and can be run separately in PCMark2002. The idea of the Crunch test is to see how well the system can perform several tasks simultaneously and stress different parts of the system concurrently. It is therefore recommended that in addition to running the Crunch test on various systems for comparison, those included tests should also be run separately, in order to be able to compare how much performance was lost, when running the tests concurrently.

Windows XP 2D tests

A number of GDI+ tests are used to demonstrate the new 2D functionality of the Windows XP graphical user interface (GUI). Unfortunately, these tests scale mostly with the overall system performance and not according to the graphics card speed. If some GDI+ features are hardware accelerated, several of the tests might show exceptional performance. Taking this into account, these tests are more feature presentations than the other tests in PCMark2002. They have been moved to the pro-version in the hope that pro-version users would more likely understand these tests, and why the results scale as they do.

The following GDI+ features are tested:

Some tests have been made a bit artistic in order to make them more pleasant to watch, but this extra styling has minimal impact on the performance.

Video tests

There are 4 kinds of Video tests in PCMark2002:

NOTE: The ASF encoding and playback tests require that Microsoft Windows Media Encoder 7.1 and Microsoft Windows Media Player 7.1 are installed. DVD playback and the video quality tests require that a DVD playback program is installed. During development the following DVD players were successfully tested:

ASF compression
ASF video clips are generated from a MPEG1 clip, using the Windows Media Encoder. Two clips are generated: A low-resolution clip with a resolution of 352x288, and a high-resolution clip corresponding to DivX, with a resolution of 640x480.

ASF playback
ASF clips are played, and the CPU load is measured during playback. Two clips are included in the benchmark: A low-resolution clip with a resolution of 352x288, and a high-resolution clip corresponding to DivX, with a resolution of 640x480.

DVD playback
This test measures normal DVD playback performance. It will be run only if DVD playback software is installed. PCMark2002 will not contain any default decoder. This is due to the wide variety of highly optimized players, which would make the use of any default player unfair. The CPU load is measured during playback of two clips of the resolution 720x480, one with a 4 Mbit/s bit rate and one with variable 7-12 Mbit/s bit rate.

Video quality
DVD playback quality is tested with a series of De-interlacing tests. De-interlacing is a process of transforming video-originated interlaced content into progressive format for display on a standard PC monitor. Most de-interlacing solutions currently used are far from optimal, resulting in different types of artifacts when for example motion or graphic elements are present in the content. The de-interlacing tests use a DVD player installed by the user to playback the test bitstreams. PCMark2002 contains no default DVD playback software.

Different DVD playback software can produce different test results. The tests and their bitstreams are designed to produce a noticeable artifact when a sub-optimal method is used to de-interlace the video. The clips contain information of their interlaced nature instructing the decoder and graphics adapter to do their best. These tests are especially designed to reveal which of the common de-interlacing methods is currently in use, by highlighting the resulting imperfection artifacts. There are other measurements of video playback quality, but the PCMark2002 video quality tests do not deal with those.

The Line Flicker Test contains a clip with two strips of static horizontal lines, 5 lines on the odd field and 5 on the even field. If this sequence is being de-interlaced using the basic BOB-method, heavy flickering will be visible. If this bitstream is being BOB’ed and thus flickers, the test FAILS. If the image remains still, the test PASSES. If both ODD and EVEN lines are not visible, the test FAILS.

The Feathering Test has a moving pendulum and contains fast motion. Clear feathering will be visible if the decoder is not doing de-interlacing at all, i.e. it is just "weaving" odd and even fields together. If feathering is visible, the test FAILS, and if feathering is not visible the test PASSES. If both letters in the word "OK" are not visible, the test FAILS.

Both BOB and Weave have their benefits and drawbacks. Video material with slow movements or completely still content, will look in general better using Weave than BOB, since BOB will cause flickering on horizontal lines. BOB on the other hand is better when displaying fast moving content, when Weave artifacts are usually visible. BOB and Weave are usually considered about equal as de-interlacing methods, if no refinements are added to either method.

The next two de-interlacing tests will be run ONLY if the system passes both the line flicker and feathering tests!

The Double Imaging Test uses the same moving pendulum as the feathering test to determine the de-interlacing quality with more precision. For example, if the system uses Median Filtering to de-interlace the video, it will pass both the previous tests but show clear double imaging in this test.

The Jagged Edges Test is designed to be passed only by the systems capable of doing very high-quality de-interlacing such as Directional Correlational De-interlacing or Motion Vector Steered de-interlacing.

The following table shows the video quality tests horizontally in testing order, and vertically the typical de-interlacing methods. Compare your video quality results with the rows in the table below to find out what de-interlacing method your system is using.

  Line Flicker Test Feathering Test Double Imaging Test Jagged Edges Test
Single Field FAIL FAIL N/A N/A
BOB FAIL PASS N/A N/A
Weave PASS FAIL N/A N/A
Median PASS PASS FAIL FAIL
Advanced PASS PASS PASS PASS

Tearing is an artifact caused by incorrect synchronization of video-surface page flips. It causes the image to "tear" horizontally into to two or more parts. The last video quality test should show noticeable artifacts, if the playback causes tearing.

Battery test

This test loops some selected PCMark2002 tests from full battery charge until a user defined charge remains. The user will be alerted if the battery test is started with a charge of less than 95%. Both the amount of run tests and the results of the tests are reported in the scores, because the figures might be lower when running a laptop using the battery due to power saving features. The test has recommended default settings, but the user is encouraged to use custom settings that meet the needs of the intended test round.

NOTE: If the Battery test is run according to the default settings, PCMark2002 will give a Battery score based on the successfully completed test loops. An initial minimum battery charge of 95% is required. Individual test result values do not affect the Battery score.

The battery test works as follows:

  1. Select the tests to loop in the battery test. By default only the CPU tests are selected. This is a good choice because then the testing loop is short enough, and the laptop will most likely complete several loops until the given battery charge limit is reached. This makes it easier to compare the battery life of several laptops.
  2. Choose the lower battery charge limit, i.e. when the test should end. A high battery charge percentage is a good choice, because then the test will run for a shorter time. On the other hand, battery charge and the indication of it, varies between laptops and therefore a lower battery charge limit is more likely to deliver a more reliable comparison. By default the test ends when the battery charge is 40%. This is an appropriate value, since some laptops power down already around 30% charge remaining.
  3. There is also an optional delay between the test loops, where the delay time can be specified. The idea is that during this delay the power saving features would have time to operate for a while, until the next test loop is activated, probably running the system with a 100% load again. This selection was implemented due to the fact that laptops are far from always used on the battery, running a 100% system load all the time. By default there is no idle time between the test loops.
  4. Specify a text file where the results will be saved. This file is updated along the test, so that it is possible to run until a battery power failure occurs, and the test results remain on the HDD of the laptop.
  5. It can also be selected whether the results of the last test round overwrite the previous ones, or if the results of each new test round are appended to the existing result file. It is recommended to log the results of all test rounds, since advanced power saving features might offer a lower performance as the battery charge decreases, and this difference is visible in such a result file. By default the results of each test loop are appended to the result file.

The selected tests will then be looped until the desired battery charge remains. When the Battery test is complete, the result file will be displayed.

It is recommended to use the Battery test for laptop battery comparison. Select a small number of tests, since the Battery test will not loop that many times when all tests are selected, and therefore makes comparison harder. Select the tests according to what type of laptop battery workload you want (CPU, memory, HDD). The video performance tests offer a good combined workload. Select a wide battery charge range, since the battery charge seldom is quite linear. By default, the battery charge range is 95% - 40%. Run the Battery test with the same settings on all laptops to be tested. The most important result is the amount of testing loops each laptop could accomplish, since this reflects which laptop can operate the longest time without a recharge. The Battery test results also show the performance of the laptops, and the performance of a laptop is usually lower, when using the battery. Remember to take into consideration all possible power saving features, and try to run the Battery test with power saving settings as equal as possible on all laptops to be compared.

The Battery test results will be affected by the following issues: